Machine Learning Classification Algorithms to Recognize Chart Types in Portable Document Format (PDF) Files
نویسندگان
چکیده
Chart recognition system from PDF files is a relatively young research field where techniques and algorithms are proposed to identify type of charts and interpret them. This paper focus on recognition of chart type that is a part of PDF document using texture features and classification algorithm. Eleven types of texture features and three classifiers, namely, Multilayer perceptron, support vector machine and K nearest neighbour, are used. Performance analysis of the proposed chart type recognition systems show that texture features for chart type recognition has promising future and produces best result while using KNN and SVM algorithm.
منابع مشابه
Evaluating the Efficiency of Rule Techniques for File Classification
Text mining refers to the process of deriving high quality information from text. It is also known as knowledge discovery from text (KDT), deals with the machine supported analysis of text. It is used in various areas such as information retrieval, marketing, information extraction, natural language processing, document similarity, and so on. Document Similarity is one of the important techniqu...
متن کاملExtraction, layout analysis and classification of diagrams in PDF documents
Diagrams are a critical part of virtually all scientific and technical documents. Analyzing diagrams will be important for building comprehensive document retrieval systems. This paper focuses on the extraction and classification of diagrams from PDF documents. We study diagrams available in vector (not raster) format in online research papers. PDF files are parsed and their vector graphics com...
متن کاملA Machine Learning Approach for Semantic Structuring of Scientific Charts in Scholarly Documents
Large scholarly repositories are designed to provide scientists and researchers with a wealth of information that is retrieved from data present in a variety of formats. A typical scholarly document contains information in a combined layout of texts and graphic images. Common types of graphics found in these documents are scientific charts that are used to represent data values in a visual form...
متن کاملMapping The Genetic Relationships of the World’s Languages
Dr. Stephen Huffman [email protected] [Editor’s note: This paper describes a series of maps produced by Dr. Huffman using a pre-release version of GMI’s World Language Mapping System (WLMS) GIS data set. Most of these maps have been adjusted to use the current released version of the WLMS along with GMI’s Seamless Digital Chart of the World. Images of the maps, Portable Document File (PDF) files of ...
متن کاملUsing Steganography to hide messages inside PDF files
Steganography focuses on hiding information in such a way that the message is undetectable for outsiders and only appears to the sender and intended recipient. Portable Document Format (PDF) steganography has not received as much attention as other techniques like image steganography because of the lower capacity and text-based file format, which make it harder to hide data. However some approa...
متن کامل